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ABSTBACT 


Harkcv  chains  with  large  transition  probability  matrices 
occur  in  many  applications  such  as  manpower  models.  Under 
certain  conditions  the  state  space  of  a  stationary  discrete 
parameter  finite  Markov  chain  may  be  partitioned  into 
subsets,  each  of  which  may  be  treated  as  a  single  state  of  a 
smaller  chain  that  retains  the  Markov  property.  Such  a  chain 
is  said  to  be  "lumpatle*^  and  the  resulting  lumped  chain  is  a 
special  case  of  more  general  functions  of  Markov  chains. 

There  are  several  reasons  why  one  might  wish  to  lump. 
First,  there  may  be  analytical  benefits,  including  relative 


simplicity  of  the  reduced  model  and  development  of  a  new 
model  which  inherits  known  or  assumed  strong  properties  of 
the  original  model  (the  Markov  property)  .  Second,  there  may 
be  statistical  benefits,  such  as  increased  robustness  of  the 
smaller  chain  as  well  as  improved  estimates  of  transition 
probabilities.  Finally,  the  identification  of  lumps  may 
provide  new  insights  about  the  process  under  investigation. 

However,  a  problem  that  arises  in  connection  with  prac¬ 
tical  applications  cf  Markov  chain  models  is  to  determine 
whether  the  chain  is  lumpable.  This  is  especially  difficult 
when  the  matrix  P  =  {p^. }  of  transition  probabili ties  is 
estimated  from  transition  data.  In  this  case,  it  is  desir¬ 
able  to  find  bounds  cr  A#  the  largest  error,  ph  -  p-  ,  in 

J  d 

estimating  p-  ,  for  all  i  and  j. 

This  thesis  examines  the  sensitivity  of  the  lumping 
conditions  phased  on  £,  the  estimate  of  ?.  In  general,  it  is 
found  that  the  classical  lumping  conditions  are  extremely 
sensitive  to  the  estimation  error  which  can  be  expected  to 
occur  even  with  large  data  sets.  Thus,  tnese  conditions  may 
be  of  limited  value  in  many  actual  applications. 
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I.  IHTBODOCTIOH 

Markcv  chains  with  large  transition  probability  matrices 
occur  in  many  applications  such  as  manpower  models.  Under 
certain  conditions  the  state  space  of  a  stationary  discrete 
parameter  finite  Markov  chain  may  be  partitioned  into 
subsets,  each  of  which  may  be  treated  as  a  single  state  of  a 
smaxler  chain  that  retains  the  Markov  property.  Such  a  chain 
is  said  to  be  "lumpable"  and  the  resulting  lumped  chain  is  a 
special  case  of  more  general  functions  of  Markov  chains. 

Consider  a  Markov  chain  {X:t  =  0,1,2,...}  with  finite 
state  space  S  =  {1,2,...,n},  stationary  transition  prob¬ 
ability  matrix  P  =  £p-} ,  and  a  priori  distribution  of 
"initial  states",  po  =  (p,° ,  , . . .  , p,°)  .  Let  s'  denote  a 
nontrivial  partition  of  S  into  m  <  n  "lumps",  say 
s'  =  {I  ( 1)  ,L  (2)  ,.  ,.,L  (m) }  .  If  {Xt}  is  lumpable  with  respect 
to  T,  denote  by  the  lumped  chain  with  state  space  ?  and 
transition  probability  matrix  T. 

A  well-known  characterization  [Ref.  2]  is  that  (Xt}  is 
lumpable  to  {X^}  if  ar.d  only  if  there  exist  matrices  A  and  B 
such  that 

BAPB  =  PB  (1.1) 

where  B  consists  of  m  nonzero  orthogonal  n-dimensional 
column  vectors  whose  components  are  zeros  or  ones,  and  A  is 
E*  with  rows  normalized  to  probability  vectors  (i.e, 
A  =  (E’  B)r»B») .  The  positions  cf  the  1*s  in  each  column  of  3 
correspond  to  states  in  S  that  together  fora  a  lump  in  o". 
It  fellows  that  if  BATE  =  PB  is  satisfied,  then  T  =  AP3  as 
is  shewn  in  Chapter  2. 
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Many  of  the  mathematical  quantities  associated  with  {Xt} 
can  be  transformed  directly  to  corresponding  quantities  for 
{X^} ,  using  the  lumping  matrix  B.  In  Chapter  2,  for  example, 
we  shew  that  if  an  original  Markov  chain  {X^}  is  lumpable  to 
£X^_ }  and  {Xt]  is  ferther  lumpable  to  [X^ }  ,  then  [X^.}  is 
directly  lumpable  to  [X^_}  ,  and  we  give  the  lumping  matrix 
for  {X^}  in  terms  of  the  underlying  two  lumpings. 


There  are  several  reasons  why  one  might  wish  to  lump 
[Ref.  1].  First,  there  may  be  analytical  benefits, 
including  relative  simplicity  of  the  reduced  model  and 
development  of  a  new  model  which  inherits  known  or  assumed 
strong  properties  of  the  original  model  (the  Harkcv  prop¬ 
erty)  .  Second,  there  may  be  statistical  benefits,  such  as 
increased  robustness  of  the  smaller  chain  as  well  as 
improved  estimates  cf  transition  probabilities .  Firally, 
the  identification  of  lumps  may  provide  new  insights  about 
the  process  under  investigation. 

However,  a  problem  that  arises  in  connection  with  prac¬ 
tical  applications  cf  Markov  chain  moderns  is  to  determine 
whether  the  chain  is  lumpable.  For  chains  with  large  state 
spaces  S,  it  is  practically  impossible  to  use  an  exhaustive 
search  to  determine  whether  lumpaoility  conditions  such  as 
those  given  in  equation  (1.1)  are  met  for  some  matrices  B, 
because  cf  the  large  number  of  ways  partitioning  S,  i.e,  the 
large  number  of  candidate  B  matrices.  For  example,  if  S  has 
10  elements,  there  are  115,975  partitions  of  S. 


Another  problem  is  to  estimate  the  matrix  P  =  {p^.}  of 
transition  probabilities  and  to  find  bounds  on  A,  the 


largest  errer  of  -  p-  for  all  i  and  j.  We  shall  investi¬ 
gate  the  sensitivity  of  the  lumping  conditions  in  equation 
(1.1)  fer  varying  A.  If  is  lumpable  with  lumping 
matrix  3,  is  condition  (1.1)  satisfied  with  P  replaced  by 


a 


This  thesis  will  attempt  to  examine  the  sensitivity  of 
tne  lumping  conditions  based  on  reasonable  estimation  errors 
A  when  P  is  not  kncwn  and  must  use  estimated  by  We 
jescribe  these  facts  about  lumpatility  using  eigenvalues  and 
eigenvectors,  including  the  theorem  mentioned  by  D.R.Barr 
and  M.O. Thomas  [Fef.  3].  He  do  not  review  elementary 
concepts  of  Markov  chains  here;  the  reader  may  wish  to 
consult  [fief.  2]  and  £Eef.  4]  for  review  of  basic  facts  and 
specific  terminologies  such  as  lumpability,  regular  Markov 
chain,  etc. 


II.  THEORY  OF  LOMPABILITY 

This  chapter  will  cover  general  facts  about  lumping  such 
as,  conditions  for  lumping,  the  number  of  partitions 
possible  for  any  given  size  of  state  space  S,  and  theorems 
associated  with  eigenvector  conditions  for  Markov  chain 
lumpa  til  it.y . 

A.  CONDITIONS  FOR  LUEPIHG 

Consider  a  Markcv  chain  £X:t  =  3,1,2,...}  with  finite 
state  space  5  =  £  1  , 1,  . . .  ,  r. }  ,  stationary  transition  prob¬ 

ability  matrix  ?  =  {p.^},  and  a  priori  distribution  of 
"initial  states",  p°  =  (P,°  ,  p° , .  . .  ,  ?A°)  .  Let  T  denote  a 

nontrivial  partition  of  S  into  a  <  n  "lumps",  that  is 
£1(1)  ,  L  (2 )  ,  ...,  L (m)  }  .  If  is  lumpatle  with 

respect  to  s',  denote  by  £X^.}  the  lumped  chain  with  state 
space  s’  and  transition  probability  matrix  'p. 

e  new  show  that  if  the  condition  (1.1)  for  iumpab ility 
with  respect  to  the  lumping  matrix  B, 

BA?  B  =  PE  (2.1) 

is  satisfied,  then  the  lumped  transition  matrix  T  is  given 

ty 

P  =  AP  3  (2.2) 

Proof,  rr.  is  the  sum  Y  p,.  ,  waere  L  ( j)  is  the  partition 

1  fcfcUi) 

subset  containing  j  e  S  and  i  is  any  element  or  L  (1)  •  By 


the  lumpatibility  condition,  this  value  is  the  same  for  any 
i  6  I  (i) .  But  the  product  PB  sums  the  columns  of  P  in  accor¬ 
dance  with  the  partition  subsets  indicated  by  the  columns  of 
E.  Hence,  P3  is  an  n  x  m  matrix  with  rows  repeated  in  accor¬ 
dance  with  the  partition  sets  1(1),  1(2),  ...  ,  L  (m)  ;  the 
effect  of  t-re-multip lying  by  A  =  is  to  "average" 
these  common  rows  yielding  an  m  x  m  matrix  T  without  the 
repeated  rows.  But  such  ’’averages"  are  just  the  common  rows 
being  averaged.  Hence,  T  =  APB  is  the  m  x  m  transition 
matrix  of  the  lumped  chain  with  state  space 
(1(1)  ,1(2)  ,...,!(■)}  . 


Example  1.  Consider  a  transition  probability  matrix  P 
with  4  states  which  can  be  partitioned  into 
S*  =  {  (1)  ,  {2,3}  ,  (4}}  =  (1(1)  ,1(2)  ,1(3)}.  let 


into 


1/16 

3/16 

1/2 

1/12 

1/12 

5/6 

1/12 

1/12 

5/6 

1/32 

3/32 

0 

’I  0 

0  1 


'10  0  0 
0  0.5  0.  5  0 

.0  0  0  1 


Fe  know  equation  (2.1)  is  satisfied  with  partitioning 
S  =  (I  ( 1 )  , I  (2)  ,1  (3)  }  .  Thus,  the  lumped  i  msi’-ion  matrix  is 

'0.  25  0.25  0.5  " 

APB  =  7  =  0  0.167  0.833 

.0.875  0.125  0  J  . 

Many  of  the  mathematical  quantities  associated  with  {X^} 
can  be  transformed  directly  to  corresponding  quantities  for 
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si  nee 


{X^}  ,  using  A  and  B  c£  equation  (2.1).  For  example, 

A3  is  the  m-dimensional  identity  matrix,  it  follows  that  for 
s  a  positive  integer, 

(?)  =  (APE/  =  A  (P£)A  (P3)  .  ..  A  (PB)  =  A?S  3  (2.3) 

He  now  show  that  (pf  =  APS3 

(?)  =  (APB)  (APB)  (AP3)  .  .  .  (APB) 

=  AE  (B APB)  (APB)  .  .  .  (APB) 

=  AP  (P3)  (APE)  .  .  .  (APB) 

=  AP2  (BAPS)  .  .  .  (APB) 

=  AP2  P3.  .  .  (APB) 

•  •  • 

=  A?SB  . 

But  AFSB  =  Ps,  since 

BAPB  =  PE 
PBAPB  =  P  2B 
BAE3APB  =  P2B 
EAP2B  =  P 2  B 


E  APS  B  =  ?SE  , 

so  ?s  is  lumpable  with  the  same  matrix  B  and  Ps  =  APS3. 
This  implies  in  turn  that  if  (Xt)  has  steady  state  distribu¬ 
tion  IT,  then  (Xt)  has  steady  state  distribution  IT  =  TTB. 

Theorem  1.  The  steady  state  distribution  Tf  of  the 
lumped  chain  (X^)  is  1TB  where  IT  =  IT?. 

Proof.  TT3  =  ( 1TP)  B 

=  ITS  (APB) 

=  (TTB)? 

Therefore ,  TT  =  TTB. 
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Similarly, the  a  priori  distribution  p°  of  the  initial 
state  of  the  lumped  chain  corresponding  to  that  of  the  orig¬ 
inal  chain  p° ,  is  given  by  po  =  P°B,  since  by  equations 

(2.  1)  and  (2.3)  , 

p°Ps  B  =  po? P3 

=  pop P3AP3 

=  po?  .  .  .  pbT 
—  • 

=  pOB?*. 

Note  that  p°P  3  is  the  distribution  of  lumped  states  occu¬ 
pied  by  the  lumped  chain  after  s  transitions.  Since  this 

equals  p°EP*=  p6?*,  it  follows  that  p"o  =  pOB. 

B.  PAHTITICNS  OF  A  SET  OF  STATES 

The  matrix  B  consists  of  m  nonzero  orthogonal 
n-dimensional  column  vectors  whose  components  are  zeros  and 
ones  which  determine  a  specific  partition  of 
S  =  {1,2,...,n}.  Example  1  illustrates  this,  where  the 
state  space  S  =  {1,2, 3, 4}  is  partitioned  into 

S'  MM3,  [2,3},  {4})  =  {L  ( 1) ,  L  (2)  ,  L  (3)  }  ,  and 

•  1  0  0 
3=010 
0  1  0 
-0  0  1 

Permutations  of  these  columns  give  a  matrix  which  also 
lumps  {X^}  .  In  order  to  see  this,  let  B*"  be  3  with  columns 
permuted  in  some  order.  Then  B*  =  B’l*",  where  I  is  the  iden¬ 
tity  matrix  with  its  columns  permuted  in  the  same  order.  Now 
if  BAP3  =  PE,  then 
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E*]fPB*  =  B*  ( B*  E*)“l  B*  P  B* 

=  B-I*  (I*B'  BT*)'1^’  PBI* 

=  B-I*  (I*rl  (B*  BrMI^VlI*B‘PBI* 

=  B  {8*  2*)  B'PBI* 

=  EAPBI* 

=  P3I* 

=  P3*  , 

so  it  follows  that  {X^}  is  also  lumpable  with  respect  to  the 
matrix  3*. 

Now,  how  many  candidate  lumping  matrices  are  there?  This 
would  be  the  number  of  partitions  of  S.  [Ref.  5]  gives  a 
recursion  relation  for  the  number  A^  of  ways  of  partitioning 
a  set  S  =  {1,2,. . .  ,  N}  : 


Am  =  ^  ^  J  H  (N  >  1  ,  A0  =  1)  (2.4) 
|c-o  K 

from  this  relation  we  find  A,  =  1,  A^=  2,  Aj=  5,  A^  -  15,  etc. 
The  sizes  of  the  entries  in  Table  1  show  that  it  would  be 
impossible  to  use  a  trial  and  error  approach  to  finding 
lumping  matrices  B  for  lumping  a  chain  with  larger  state 
spaces,  say  with  10  or  more  elements.  Values  of  A^  for 
larger  N  are  shown  in  Table  1. 


N 

Partitions 

Partitions 

TABLE  1 

of  a  Set 

N 

of  N  States 

Partitions 

i 

c 

52 

20 

5. 172415 

X 

10^3 

6 

203 

30 

8.467490145 

X 

1023 

7 

877 

40 

1. 574505884 

X 

1035 

8 

4140 

50 

1.857242688 

X 

1C*7 

9 

21147 

60 

9.769393075 

X 

1054 

10 

115975 

70 

1.80750039 

X 

10*3 

It  is  of  interest  to  be  able  to  systematically  prescribe 
alternative  lumpings  by  generating  matrices  B  for  a  given 
transition  matrix  P,  using  some  method  other  than  trial  and 
error.  In  the  next  section,  we  describe  an  approach  to 
finding  E  matrices  using  the  eigenvalues  and  eigenvectors  of 
E. 


C.  AN  EIGENVECTOB  CONDITION  FOB  HABKOT  CHAIN  LUflPABIIITY 

Many  problems  in  science  and  mathematics  deal  with  a 
linear  operator  T  :  V— >V,  and  it  is  of  importance  to  deter¬ 
mine  these  scalars  for  which  the  eguation  Tx  =  } \x  has 
nonzerc  solutions  x.  In  this  section  we  discuss  this 
problem  and  its  relationship  with  finding  matrices  B. 

Theorem  2.  The  value  1  is  always  an  eigenvalue  fer  any 
Markov  chain  transition  probability  matrix. 

Proof.  Let  P  be  any  n  x  r.  transition  probability  matrix 
of  {Xt} ,  x  be  a  left  eigenvector  in  R* ,  and  A  be  the  corre¬ 
sponding  eigenvalue  of  ?.  Then  xP  =  x^  which  is  equivalent 


■  "V 

- .  '  •-1 

• 

► 

x  (P  -  AI)  = 

0 

(2.5) 

P  For  A  to  te  an  eigenvalue,  there  muse  be  a 

nonzero  solution 

|  x  of  equation  (2 

.  5)  .  Equation 

(2 

.5)  will 

have 

a  nonzero 

1.  solution  if  and  only  if 

B 

det  ( P  -  AI) 

=  0 

• 

(2.6) 

[>  This  is  called  the  characteris 

tic 

equation 

.  To 

show  that 

IH  A  =  1  always  satisfies  equation  (2 

.6),  we 

need 

only  show 

1  that  the  columns  of  the  matrix 

in  equation 

(2.6)  are 

|  linearly  dependent. Note  that 

ii 

M 

i 

nt 

• 

■p  p  .  .  .  o- 

'1  0  .  . 

.  0  ' 

P  p  .  .  .  p 

— 

0  1.. 

.  0 

V 

.  P 

.00.. 

.  1. 

9 

■p  -1  p  .  .  . 

P  • 

•  •  P 

:'i 

p  p  -1  .  . 

P  * 

.  .  p 

m 

•  • 

•  • 

(2.7) 

LP  p  .  .  . 

•  • 

•  •  p  "*  1  « 

n 

i  Since  L  F;:  =  1  for  Markov  chains. 

it  follows  that  the  rows 

[  J'1.  J 

in  equation  (2.7) 

sum  to  zero. 

so 

the  determinant  in  egua- 

F-  tion  (2.6)  is  zero  with  A  =  1. 

It 

follows 

that  A  =  1  is  an 

1  eigenvalue  of  the 

Markov  chain 

{X*} 

.  We' d 

next 

like  to  see 

[  properties  of  eigenvectors  corresponding  to 

the 

eigenvalue 

• 

> 

ii 

• 

i  Theorem  3.  For  any  regular 

Markov  chain 

,  components  of 

the  eigenvector 

corresponding 

to  A  =  1  are 

proportional  to 

the  steady  state 

distribution  of  (X  }  . 

• 
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Proof.  Let  x  be  a  left  eigenvector  of  P,  and  A  be  the 
correspondin  ,  eigenvalue  of  P,  such  tnat  xP  =  xA ,  ani 
assume  £ x; =  1.  For  given  A  =  1,  xp  =  x.  The  steady  state 
distribution  of  {X^}  is  unicue  [Eef.  4].  Therefore,  x  must 
be  the  steady  state  distribution  TT  since  A  Xj =  1. 

The  following  example  demonstrates  Theorem  3. 

Example  2.  Let 


‘  1/4 

1/16 

3/16 

1/2 

p  = 

0 

1/12 

1/12 

5/6 

0 

1/12 

1/12 

5/6 

.  7/8 

1  32 

3/32 

0 

The  eigenvectors  corresponding  to  the  eigenvalues  of  F  are 
displayed  as  column  vectors  below: 


Eigenvalues 


Ei cenvectors 


C.7367 


-0.25 

10.5 


-0.3333 

0.8247 


0.09209  -0.7201  -0.375  -0.03436 

C.2236  0.7201  -4.125  -3.2405 

0.6315  0  -6  -0.5498  . 


Note  that  TT  *  (  TT,  #  X.  ,  IT*  ,  IT*  ) 

=  (0.  4375,  0.0547,  0.1328,  0.375), 


wner  e 


\  * 


_ _ 

■+•  o.o^a.o^i'olai^+o,6^lS'  ' 


Theorem  4.  Eigenvectors  corresponding  to  eigenvalues 
other  than  1  are  orthogonal  to  e  =  (1,1,.. .,1). 

Proof.  xe*  =  x(Pe’)  =  (xP)  e'  =  xAe1.  Therefore,  xe1 
must  he  zero  for  A  *  1« 

«e  are  also  interested  in  finding  the  relationship 
between  eigenvalues  cf  P  and  those  of  lumped  transition 
probability  matrix  T,  where  T  =  APB  as  described  above. 
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Theorem  5.  Suppose  (XtJ  with  transition  matrix  F  is 
lumpable  to  (X^J  with  transition  matrix  ?.  The  eigenvalues 
of  “p  are  eigenvalues  cf  P. 

Proof.  Let  ft {>0  =  0  be  the  ( n ^  degree)  characteristic 
equation  of  P.  By  the  Cayley  -  Hamilton  theorem  £Ref.  6], 

ft{F)  =  QhPn*  .  .  .  ♦  0,P  *  =  0  , 

which  together  with  equation  (2.3)  implies 

Aft(P)B  =  .  .  .  ♦  fi.1 

=  a(P) 


Since  P  satisfies  P's  characteristic  equation  and  since 
eigenvalues  of  ft(X)  are  of  the  form  ft  (X)  /  it  follows  that 
ft(A  )  =  0.  Thus  all  eigenvalues  A  of  *?  are  also  eigenva¬ 
lues  cf  P. 

We  next  examine  the  eigenvectors  of  P  and  P,  with  the 
aim  cf  identifying  luapings  of  {X^}  directly  in  terms  of  the 
eigenvectors  of  P.  Tie  have  seen  that  is  obtained  directly 
as  p°E;  a  similar  relationship  holds  with  eigenvectors  cf  P. 

Theorem  6.  Suppose  x  is  a  left  eigenvector  of  P  corre¬ 
sponding  to  eigenvalue  A,  and  suppose  {X^}  is  lumpable  to  a 
chain  with  transition  matrix  T  =  APB.  Then  x3  satisfies  the 
equation  (xP)T  =  (xB)A  . 

Proof.  3y  equation  (2.1),  (xB)T  =  xBAPB  =  xFE.  Eut 
x?  =  xA  i  and  the  result  follows. 

We  note  that  xB  is  not  necessarily  an  eigenvector  of  ? 
because  it  may  be  zero.  In  fact,  it  easily  follows  that 
xB  =  0  if  A  is  not  an  eigenvalue  of  ?.  But  xB  may  be  null 
even  if  A  is  ar.  eigenvalue  of  ?,  in  cases  of  where  A  is  a 
repeated  eigenvalue  cf  P  more  times  than  of  "?". 


rs  r 


a 

S 


[ Eef .  7]  pointed  out  some  other  useful  properties  asso¬ 
ciated  with  eigenvalues  and  eigenvectors  such  as  :  1)  if  the 
matrix  P  is  symmetric  ,  then  eigenvalues  are  real  and  eigen¬ 
vectors  are  different  for  repeated  eigenvalues,  2)  if  the 
matrix  is  not  symmetric,  then  the  eigenvectors  are  the  same 
for  repeated  eigenvalues. 

Theorem  7.  If  {X^}  with  transition  matrix  P  is  lumpatle 
to  {X"^}  with  transition  matrix  T,  and  {X^}  is  lumpatle  to 
{X^}  with  transition  matrix  ?,  then  {X^}  is  directly 
lumpatle  to  £X^}  where  £X^}  is  the  lumped  chain  of  [X^} . 

Proof.  Let  £X^}  be  lumpatle  to  [X^J  ,  and  {X^}  be 
lumpatle  to  (X^J  by  matrices  B,  and  3,.,  where  B,  and  3X  are 
lumping  matrices  in  which  the  dimension  n  x  m  of  3,  is 
greater  than  that  of  34.  By  equation  (2.1),  T  =  A,  PE,  and 
F  =  Aj.PBx.  Thus, 

f  =AiPBi  =  A4{A,PP,)BX*  ( A2  A ,  )  P  (B ,  3j ) 

To  see  that  ByB*  is  a  lumping  matrix  and  A2A|  is  cf  the 
required  form,  we  need  to  show  that  (AyA,  )-(B,-B2)  is  the 
identity  matrix  as  mentioned  in  Section  A.  But 

(AiAjMByBJ  =  A  y  ( A  ,■  3 ,  )•  B  A  =  A  y  I  •  2  x  =  =  I  . 

Also,  note  that  E,  B4  is  3,  lumped  by  B* ,  so  E,-32  has  columns 
cf  the  required  form.  Therefore,  (X^J  is  directly  lumpatle 
ho  {X^.}  ,  by  the  lumping  matrix  3  =  B,-3a  . 


Example  3.  Consider  a  Markov  chain  with  5  states,  and 
transition  probability  matrix 


o 

• 

U) 

0.1 

0.2 

0.1 

0.  3' 

0.  1 

0.  3 

0.  1 

0.3 

0.2 

0.5 

0.  1 

0 

0.  1 

0.  3 

0.  1 

0.5 

0.2 

0.  1 

0.  1 

-0.  5 

0 

0.1 

0.2 

0.2. 

First,  consider  S  =  {1,2,3, 4, 5}  which  can  be  partitioned  to 
s' =  {  {1}  ,  {2,4}  ,  {3,5}  }  =  {L(1),L{2)  ,L  (3)}  .  The  corresponding 
lamping  matrices  are 


•1  0  O' 
0  1  0 
0  0  1 
0  1  0 
.0  0  1. 


and 


0  0  0  O' 

0.  5  0  0.5  0 

0  0.5  0  0.5. 


and  the  lamped  transition  probability  matrix  is 


P  =  A,  FB, 


0.  3 

0.  2 

0.  5 

0.  1 

0.  6 

0.3 

in 

• 

o 

0.2 

0.3 

Secondly,  consider  S  with  3  states  which  can  be  partitioned 
to  *  =  {{1,3},  {2}}  =  {L*  (1)  ,L»  (2)  },  with  matrices 


3a  = 


1  0 
0  1 
L  1  0  J 


and 


A*  = 


0.5 

0 


0.  5 
0 


]  . 


The  corresponding  lumped  transition  matrix  is 


= 


0.8 
0.  4 


0.21 
0.6  J 


Finally,  consider  lumping  the  transition  probability  matrix 
directly.  For  partitioning. 


s  =  {[1/3,  5},  {2,4}}  =  { L"  ( 1 )  ,  L  "  ( 2)  } ,  and 


'i  oi  n  o  o' 

oi  o  i  o  ri  o ' 

1  0  =  B,-3i=  0  0  1  0  1  ,  A^At  =  ri/3  0  1/3  0  1/3 

0  1  0  1  0  h  0  J  L  3  1/2  0  1/2  0 

.i  oJ  L°  o  i- 

anl  the  directly  lumped  transition  probability  matrix  is 


?  =  ro.3  0.21 
L  0-  4  0 . 6  J 


Theorem  7  shows  that  lumping  is  " * ransitive",  in  tne 
following  sense.  Define  two  transition  matrices  P  and  Q  to 
te  equivalent,  (P  =  Q) ,  if  Q  =  T  for  a  lumping  matrix  2 
whose  columns  are  these  of  the  identity  matrix,  in  seme 
permuted  order.  (Thus  the  chain  {Xt}  and  {Y^}  differ  only  in 
the  labels  associated  with  their  states)  .  Define  a  relation 
"  <  "  between  transition  matrices  as  follows:  Q  <  P  if  and 
only  if  Q  =  ?  for  seme  lumping  matrix  3.  Then  theorem  7 
shows  that  Q  <  P,  3  <  Q  ^  F  <  ?.  This  relation  "  <  "  is 
reflexive,  since  C  <  2  using  the  lumping  matrix  I  (iden¬ 
tity)  .  Finally,  "  <  "  is  antisymmetric  since  Q  <  ?  and 
F  <  Q  =>  E  =  Q.  Thus,  the  set  if  all  transition  probability 
matrices  is  partially  ordered  by  the  "lumping"  partial 
order.  "  <  ". 


III.  BOUNDS  CN  THE  LARGEST  ERBOB,  A  .  IN  P 

In  this  chapter  we  consider  three  procedures  to  find 
tour.ds  on  A.  First,  we  use  the  central  limit  theorem  for 
given  i  and  j.  Secondly,  we  use  a  binomial  approximation  on 
the  basis  of  the  first  procedure.  Finally,  we  get  the 
largest  error  A,  using  the  asymptotic  extreme  value  distri¬ 
bution.  These  three  approximations  are  only  designed  to  give 
a  rough  idea  of  the  relationships  between  A  and  the  number 

M  of  elements  in  the  state  space,  the  total  number  of 

observed  transitions  K,.  ,  and  the  probability  oC  . 

A.  APPROACH  USING  CENTRAL  LIMIT  THEOREM 

Ye  are  interested  in  the  sizes  of  the  errors  between  the 
estimate  '?  and  the  unknown  P,  where  ?  is  the  transition 
probability  matrix  cf  {X^} .  We  assume  tne  transition  prob¬ 
ability  matrix  P  is  of  size  M  x  M. 

Let  Kjj  be  the  number  of  observed  transitions  of  {Xt} 
from  state  i  to  state  j,  and  let  K;.  be  the  number  of 

observed  transitions  from  state  i.  Similarly  K.j  is  the 

number  of  observed  transitions  into  state  j. 

Let  p-  be  an  unknown  transition  probability  from  state  i 

J 

to  state  j  and  p-.  be  an  estimate  of  p-  based  on  K„  observed 

J  J 

transitions.  Then  the  usual  estimate  p;.  of  p--  is  the  ratio 

■J  v 

of  X'jj  to  K .  Now,  as  a  rough  approximation,  imagine  that 
Ki.  is  fixed,  and  the  number  cf  transitions  from  state  i  to 
state  j,  K-  ,  is  Binomial  (X;.,p;j).  Then  by  the  central 
limit  theorem. 


U 


r« 


[C 


k  ■ 


a-  L  „  Pyj  C  l~  Pij ) 

?..  is  approximate  by  Normal]  p..  ,  - * - 

M  U  fK-.)1 


since 


E[  p ..  ' 
LJ  - 


(r. :  _ 

=  ST  —  ■  1  ~  p  ■  ,  and 

*■  ki.  *  ~  lj 


^  ii 


Var[  fy.]  =  7a r[  — ^  —  1 


Var  L  kijl 
<  fc;.)*- 


h-Ptj  o-ftj) 


(3.1) 


We  want  to  rind  a  bound  A  on  the  estimation  error  | -  f„  I 
which  occurs  with  probability  at  least  oL  ;  that  is,  the 
largest  A  for  which 

PC  i  Fh<"  1  -  A]  -  oC  • 

^  J 

Now 


?c  ifu-  p--i  >  a]  =  i: 

“  y 


Pti  -  P^j 


fcpP'tj  ( t-Pij) 

(K-r 


(3.2) 


Ei&iillfo) 

(K-T 


Let 


Pa  ~  Pc j 


CM" 


,  then 


-•V  .'r 


24 


Z  is  approximate  by  standard  Normal.  Rewrite  equation  (3.2) 
as 


E[  !  2|  > 


]  >  oC  ;  0  <*  <  1 


/fc-fag-fti) 

V 


Equation  (3.2)  is  approximately 


P[Z  > 


I 

V 


i*iO* 


since  the  Normal  distribution  is  symmetric.  Solving  for  A, 
we  nave 


A  S  3-1(1  -  "  -V 

oC  oL 

where  N~l  ( 1  -  — - — )  is  the  (1  -  — r — )  quantile  of  the  stan- 

•*-  -t' 

dard  Normal  distribution.  Suppose  tne  steady  state  prob¬ 
ability  "IT;.  of  state  i  is  based  on  the  equally  likely 

case,  ar.d  suppose  the  worst  case  in  which 


Tnen  an  approximate  value  for  A  is  given  by 


A  =  fH  (1 

=  N-1  O 

=  tH  (1 

=  !H  (1 


T 


) 


zL 

x 


) 


X 


) 


oi 

x 


) 


(0.5) 

(0.5) 

_ i _ 

1 

ym k 

i 

(0.5) 

J 

V  m 

(0- 5)^ 

/  M 

/  K.. 

(3.3) 


Equation  (3.3)  concerns  the  error  |  -  p..  |  for  fixed  i 

1  J 

and  j.  We*d  now  like  to  find  an  error  bound  A  overall  i 
and  j.  That  is,  we  wish  to  find  the  largest  A  for  which 

t[  I  Pij"  Pijl  ^  A  for  some  i  and  j]  >oC, 

which  is  roughly  the  same  as 


i 


P C f - ; -  p-.  i  A  for  some  i  and  j]  >  — —  (3.4) 

We  apply  the  binomial  approximation  in  equation  (3.4),  so 
that 


P[p-j-  Fij-  A  for  some  i  and  j] 

v  J 

=  1  -  P[  ?;■  -  p •  ■  <  A  for  all  i  and  j] 
3  J 


let  1-(1  - 

gives 


=  1  -  (1  -  ^“)M 
oi.  M 


(3.5) 


=  p  for  some  0  <p<  1.  Solve  for  pC  , 


wh  ich 


< 

D 


t- 


* 

r 


t 


Substitute  the  value  cf  o<.  in  equation  (3.6)  into  equation 
(3.3).  Finally  we  get  the  approximate  bound  A  for  all  i  and 

j: 


A  a  )  (0.5)  •  (3.7) 

Equation  (3.7)  gives  an  approximate  expression  for  A»  using 
binomial  approximation. 


B.  APPROACH  USING  OBIEB  STATISTICS 


Assume 


2,  f  1X, 


•  M 


are  independent  continuous 


ZfM)  denote 


l,0>  to  largest 


SM)  » 


random  variables,  each  with  density  function  fz  (z)  and 
distribution  Fa  (z)  .  Now  let  Z{1>  ,  Z 
their  ordered  values,  from  smallest  Z, 
these  are  called  the  order  statistics  of  Z,  ,  Za,  ...  ,  ZM  . 
We  now  consider  the  probability  law  for  Z(M)  [Ref.  8],  the 
largest  or  maximum  value. 

The  event  {Z(M)<  z]  occurs  if  and  only  if  the  event 
{Z,  <  z,  Zx<  z,  ...  ,  ZM  <  z}  occurs,  since  if  the  largest 
Z  is  smaller  than  z,  all  fl  of  the  random  variables  must  be 
smaller  than  z,  where  z  is  any  fixed  real  number.  The 


distribution  function  for  ZfM)  is 


SMI 


(Z)  =  P£Z(H1<  z] 


=  P[  Z,  <  z,  Z  ,  <  z. 


M  - 


=  P£Z,  <  z]  PIZA  <  z]  ...  P[ZM  <  z] 

since  Z(,Zi,...,ZM  are  assumed  independent.  But  each  of  Z(,Z 
, . .  .  , Z  M  has  the  same  distribution  (z) .  So 


F,  (z)  =  [F_(«)  ] 


The  density  function  for  2  then  is 

Vz)  =  *<P 

-  *r  [Fz(2» ]M 

=  s  cf£(z)  r'f2(z) 

<1  ITj(Z) 

where  f  (z)  =  - -  . 

£  A  z 

Consider  the  limiting  distribution  function  of  the 

maximum  ZfM)  as  n  tends  to  infinity.  [Ref.  9,  10]  show  this 
distribution  is 

-^aCo^  (  *  -  ^  ) 

lim[F_(z)]M  =  C  (3.3) 

M-900  z 

if  Z , ,  ZA,  ...  ,  ZM  is  a  random  sample  from  standard  Normal 
population.  Re  want  find  a  bound  A  on  the  largest  of  M 2 
errors  between  estimates  in  'P  and  the  unknown  components  of 
P.  The  random  variables  p- -  p;y  are  very  roughly  Normal  with 
mean  0  and  variance  —  which  is  derived  from  equation 

(3.1)  for  i,  j  =  1 , 2,. . .  ,M.  Recall  that  K..  is  the  total 
number  of  transitions  observed. 

Let  Xj  =  p->  where  1  *  1,2,. ..,:12.  Then  we  know  the 

random  variable  X  is  approximately  eguai  to  Z,  where 

Z  has  a  standard  Normal  distribution.  Let  the  random  vari¬ 

able  X(Mi)  be  eguai  to  max|  p^.J  .  Then 

lim  P[  X  ,  <  A  ]  =  lim  P[  max  j  p..  -  p;.  |  <  A  ] 

M->co  >  NHoo  1  J 

_  smallest  of  ^  -  py  £  -  A  i 

l  largest  4  ft.  -  p$  $  A  J 
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Now  x0)  and  X(M1)are  asymptotically  independent,  so  foe  large 

M, 

iim  P[X  4  <  a  j 

-  {  P[X,  A  ]  ...  >—  a  ]  }  •  [  F  ^  (A) 

=  [1  -  Fx  (- A  )  ;M‘[Fy(  A)  ;M‘ 


(3.9) 


From  equation  (3.9)  we  derive  an  expression  for  a  as 
follows.  Let  A  be  the  largest  value  for  which 


P[  X 


a  3  < 


0<  . 


This  is  the  complementary  probability  because  we  wish  to 
have  F[  I  Pij J  >  A  for  some  i  and  j  3  >  oC  ,  as  in  the 

previous  section.  The  limiting  distrinution  function  of  the 
maximum  X(Ml)is  the  same  as 

ii.tr 

e 


lim[F  (A)f*"  = 

MW 


Then,  approximately. 


log  ( 1  -  oC  ) 


-  e 


and 
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log  {-leg  (1  -o(  )}  «  -^21og2«2  (2  -^21og2N*)  . 


finally. 


A 


fy/ilog2M  2 


(3.  10) 


Equation  (3.10)  is  an  approximate  expression  for  A  based  on 
the  asymptotic  distribution  of  the  extreme  order  statistic. 
We  will  compare  the  central  limit  theorem  A's  with  those 
obtained  with  the  extreme  value  distribution,  in  the  r.ext 
section. 


C.  C OflPAfilSOH  OF  THE  THREE  EXPRESSIONS 

The  three  expressions  for  A.  obtained  using  the  central 
limit  theorem  and  order  statistics  have  beer,  developed  under 
approximations  such  as:  1)  the  steady  state  distribution 
of  {x^)  is  - pp  (equally  likely) ,  2)  the  variances  of 

|£.  -  pL.  |  have  ~rp-  as  a  maximum  value  (worst  case),  and  3) 
all  transitions  are  independent.  Information  about  {Xt}  is 
from  the  estimate  'P  because  we  don’t  have  information  about 
the  unknown  P.  In  a  view  of  the  above  approximations  and 
computations,  our  expressions  for  A  are  very  rough.  However 
they  do  provide  some  insight  into  the  occurrences  and  sizes 
of  estimation  errors  in  Pi 

Figure  3.1  contains  3  graphs  showing  A  as  a  function  of 
K„  and  M  for  fixed  oC  =  0.90  based  on  the  three  expressions 
(3.3),  (3.7)  and  (3.10). 

The  first  graph  shows  error  bounds  using  the  central  limit 

theorem  on  p*  for  fixed  i  and  j.  The  second  graph  is 

.  J 

given  by  the  same  approach  as  the  first  graph,  except 
overall  estimation  errors  are  considered,  for  all  i  and  j. 
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The  third  graph  is  based  on  the  asymptotic  distribution  of 

the  largest  value  of  |p..-  p- 1  over  all  i  and  j. 

^  J 

From  Figure  3.1  vs  see  that  the  largest  estimation  error 
depends  very  much  on  the  number  of  transition  observations 
and  matrix  size,  but  not  so  much  on  the  qC  value  as  seen 
from  Figure  3.2  . 
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VARIATION  OF  A  BY  CHANC.NG  ALPHA 


OVfRAU.  CSTHMTON  BY  USJNC  C.L.T. 
«...  5000 


lECEND 

1  :  ALPHA  «  0  50 

2  •  alpha  .  0  >5 

3  :  alpha  .  0  SC 

4  :  alpha  m  0  95 


OVERALL  ESTIMATION  BY  USING  OROD*  STATISTICS 


JO  « 

S7F(U)  Of  TRANSITION  MATRIX 


Figure  3.2  satiation  of  A  bj  Changing  ilpha  (oO . 
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Graphs  2  and  3  in  Figure  3.1  are  very  similar  even  though 
they  use  different  approaches.  They  give  an  idea  of  how 
large  likely  values  cf  A  are  for  given  K,#  and  M,  in  the 
"worst  case". 

If  we  consider  a  Markov  chain  (X+}  with  M  =  20  or  30 
states,  and  we  have  observed  K  =  5000  transitions  then, 
roughly,  it  is  likely  (prob  =  0.90)  that  at  least  cne 
element  of  $  is  in  error  by  at  least  0.1.  In  general, 
expressions  (3.7)  and  (3.10)  may  be  useful  for  Markov  chains 
[X^}  with  M  =  20  or  30  states  and  large  numbers  of  observed 
transitions. 

D.  SENSITIVITY  OF  L0BPIHG  CONDITIONS 

We  have  developed  expressions  for  A ,  using  the  central 
limit  theorem  and  order  statistics.  We  want  to  examine  the 
sensitivity  of  the  lumping  conditions  applied  to  “P,  the 
estimate  of  P.  If  eguation  (2.1),  which  is  a  necessary 
condition  for  lumping  a  Markov  chain  with  transition  matrix 

A  ^ 

P,  is  satisfied,  then  the  lumped  transition  matrix  p  is 
given  by  eguation  (2.2).  However,  even  though  P  satisfies 
tne  lumping  conditions,  it  is  extremely  unlikely  that  its 
estimate  $  will  also  satisfy  these  conditions,  as  we  shall 
now  demonstrate. 

In  order  to  simulate  the  difference  between  $  and  P, 
consider  a  matrix  of  errors  RA ,  where  R  is  a  random  matrix 
with  dimension  the  same  as  P,  whose  components  are  1 ’s, 
-I’s,  and  0's  where  the  sum  of  each  row  is  zero.  Now 
consider  the  lumpability  of  the  simulated  estimate  P*,  which 
is  constructed  by  taking  P  plus  the  random  matrix  R  times 
A,  that  is,  P*  =  P  ♦  R-A. 

1c  show  the  sensitivity  of  the  lumping  conditions,  we 
assume  the  unknown  P  is  lumpable  with  lumping  matrix  3,  and 
consider  the  difference  (BAP^S  -  P*B)  .  If  eguation  (2.1)  is 
satisfied  by  P  then  all  jf  these  components  must  be  zero. 


Theorem  8.  The  difference  (3AP*B  -  P*B)  is  a  linear 
function  of  A. 

Proof,  let  R  be  the  random  matrix  as  defined  above  and 
let  P*  =  P  ♦  R-A .  Then  (BAP*3  -  P*B)  is  given  by 

{BA(F*R-A)E  -  (P*  R- A }  £}  =  {BAPE  ♦  BARA-3  -  P3  -  R-A-E) 

=  (BAPB  -  P3  +  (BARB  -  R B) A  ) 

=  (BARB  -  RBJ-A 
=  c  A . 

Therefore  the  difference  of  EAP*B  -  P**B  is  linearly  depen¬ 
dent  cn  A  and  P*  is  not  lumpable  unless  BARB  =  RB  (i.e.,  R 
is  ” lumpahle")  ,  which  is  not  likely  to  occur. 

Since  $  is  likely  to  have  elements  differing  appreciably 
from  the  corresponding  elements  in  P  (errors  of  size  A),  it 
can  be  seen  that  the  lumpability  conditions  will  not  be 
satisfied  (not  even  nearly  so)  by  ■?,  even  though  (Xt)  is 
lumpable.  We  conclude  that  attempting  to  check  the  lumpa¬ 
bility  of  the  estimate  “P  when  P  is  not  known  is  not  useful. 


IV.  SO MM A HI  AND  CONCLOSIONS 

W€  have  given  several  theorems  associate!  with  eigenva¬ 
lues  and  eigenvectors  for  lumpable  Markov  chains  with 

finite  state  spaces.  Re  have  derived  rough,  approximate 
mathematical  expressions  for  the  largest  error  made  in  esti¬ 
mating  P  by  'p  based  cn  transition  data. 

Eoth  expressions  (3.7)  and  (3.10)  are  very  similar  even 
though  the  estimated  A  's  for  the  first  expression  are 
slightly  less  than  those  in  the  second  expression.  These 
expressions  show  that  the  largest  estimation  errors  depend 
very  much  on  the  number  of  transition  observations  ar.d  on 
the  matrix  size,  but  rot  so  much  on  the  oC  value. 

Since  'P  is  likely  to  have  elements  differing  appreciably 

from  the  corresponding  elements  in  P,  it  is  of  interest  to 

examine  whether  the  equation  BAPB  -  PB  is  likely  tc  be 

nearly  satisfied  with  P,  i.e.,  will  (BAP3  -  ?3)  be  neariy 

zero?  This  is  examined  by  simulation  of  "estimates''  p^  of 

P,  using  random  perturbations  of  elements  ox  P  of  sizes 

A 

which  are  likely  to  cccur  as  errors  in  P. 

This  shows  that  the  classical  lumping  conditions  are 

extremely  sensitive  to  estimation  errors  which  car  be 

expected  to  occur  even  when  a  large  number  of  transitions 
have  been  observed.  Thus,  the  classical  lumping  conditions 
may  be  of  limited  value  in  many  actual  applications. 

As  further  research,  it  is  recommended  that  seme 
constructive  approach  to  finding  matrices  3  for  lumping  a 
lumpable  Markov  chain  {X^}  be  developed  ,  perhaps  along  the 
lines  of  the  theorems  mentioned  in  Chapter  2.  It  is  hoped 
that  the  present  study  will  be  useful  to  those  who  might 
otherwise  have  endeavored  to  check  the  classical  condition 
for  lumpability  of  a  Markov  chain  when  the  transition 

matrix  ?  has  been  estimated. 
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